graph: backend: dnnl: support select with binary primitive #2349

Jiexin-Zheng · 2025-01-07T14:48:11Z

Description

cond input is defined for dnnl binary op
For now, as primitive doesn't support broadcast for cond input, we use binary select primitive for non-broadcast case only, the lowering logic is: always lower select to binary primitive and then decide which impl path to use in pass decompose_select_to_multiple_binary_ops and decompose it to multiple binary ops if necessary.
It's unsupported on GPU as binary primitive doesn't support either.

Performance

relative perf:

platform: Intel(R) Xeon(R) Platinum 8490H

case	speedup
./tests/benchdnn/benchdnn --graph --mode=P --reset --in-shapes=1:1x12x128x128+2:1x12x128x128 --case=complex_fusion/mha/MHA-distill_bert-inf-fp32-bs1.json	98.24%
./tests/benchdnn/benchdnn --graph --mode=P --reset --dt=bf16 --in-shapes=1:1x12x128x128+2:1x12x128x128 --case=complex_fusion/mha/MHA-distill_bert-inf-fp32-bs1.json	170.12%
./tests/benchdnn/benchdnn --graph --mode=P --reset --in-shapes=1:1x12x128x128+2:1x12x128x128 --case=complex_fusion/mha/MHA-distill_bert-inf-int8-bs1.json	140.41%

Jiexin-Zheng · 2025-01-07T15:40:58Z

make test
enable benchdnn_nightly
disable benchdnn_all
enable benchdnn_graph

src/graph/backend/dnnl/passes/lower.cpp

src/graph/backend/dnnl/kernels/sdp_decomp.cpp

src/graph/backend/dnnl/passes/lower.cpp

src/graph/backend/dnnl/passes/transform.cpp

tests/gtests/graph/unit/backend/dnnl/test_large_partition.cpp

tests/gtests/graph/unit/backend/dnnl/test_sdp_decomp.cpp

tests/gtests/graph/unit/backend/dnnl/test_select.cpp

tests/gtests/graph/unit/utils.hpp

src/graph/backend/dnnl/passes/lower.cpp

Jiexin-Zheng · 2025-01-08T13:09:31Z

make test
enable benchdnn_nightly
disable benchdnn_all
enable benchdnn_graph

Jiexin-Zheng · 2025-01-08T14:28:43Z

make test
enable benchdnn_nightly
disable benchdnn_all
enable benchdnn_graph

src/graph/backend/dnnl/passes/transform.hpp

src/graph/backend/dnnl/dnnl_shape_infer.cpp

src/graph/backend/dnnl/passes/transform.cpp

TaoLv

Do you have any performance data to share?

src/graph/backend/dnnl/kernels/sdp_decomp_config.cpp

TaoLv · 2025-01-09T03:06:40Z

src/graph/backend/dnnl/passes/transform.cpp

@@ -2266,7 +2266,8 @@ status_t binary_canonicalization(std::shared_ptr<subgraph_t> &sg) {
        int32_t src1_ndims = src1_lt.ndims;
        int32_t target_ndims = std::max(src0_ndims, src1_ndims);
        std::vector<int32_t> in_ndims {src0_ndims, src1_ndims};
-        for (size_t i = 0; i < cur_op->num_inputs(); ++i) {
+        std::vector<size_t> input_indices = {0, 1};
+        for (auto i : input_indices) {


Is this correct? Previously num_inputs() is 2 - 32 per the schema definition. Now the code only handles the first two?

This pass is applied before postop fusion pass, so input number is always 2 before. For this PR, although binary select has three inputs, since cond dims has been promised to be the same that of src0 by pass decompose_select_to_binary_ops, we only need to unsqueeze src0 and src1.

if cond dims has been promised to be the same of src0, then it should fall into the condition of if (in_ndims[i] == target_ndims) { continue; }, so no unsqueeze inserted. If this is the case, no need to limit the input_indices?

in_ndims only has two elements, the access for the third element is not legal.

ok, then it seems the original code is designed for 2 elements

This pass is applied before postop fusion pass, so input number is always 2 before. For this PR, although binary select has three inputs, since cond dims has been promised to be the same that of src0 by pass decompose_select_to_binary_ops, we only need to unsqueeze src0 and src1.

This explanation looks suspicious as the code has quite a few assumption to work properly. You may need to at least add comment for that.
BTW: I feel for (size_t i : {0, 1}) { .... } should work without defining input_indices.

Fixed: I keep the original for loop and make it skip the unsqueeze process when iterating the third input.

src/graph/backend/dnnl/passes/transform.cpp

src/graph/backend/dnnl/passes/utils.cpp

src/graph/interface/shape_infer.cpp

Jiexin-Zheng · 2025-01-09T08:07:42Z

Do you have any performance data to share?

Sure, I have attached it to the PR description.

Jiexin-Zheng · 2025-01-09T10:09:23Z

make test
enable benchdnn_nightly
disable benchdnn_all
enable benchdnn_graph

src/graph/backend/dnnl/dnnl_op_def.hpp

src/graph/backend/dnnl/dnnl_shape_infer.cpp

Jiexin-Zheng · 2025-01-09T16:00:07Z

make test
enable benchdnn_nightly
disable benchdnn_all
enable benchdnn_graph

ElaineBao · 2025-01-10T01:27:26Z

src/graph/backend/dnnl/passes/transform.cpp

@@ -2266,7 +2266,8 @@ status_t binary_canonicalization(std::shared_ptr<subgraph_t> &sg) {
        int32_t src1_ndims = src1_lt.ndims;
        int32_t target_ndims = std::max(src0_ndims, src1_ndims);
        std::vector<int32_t> in_ndims {src0_ndims, src1_ndims};
-        for (size_t i = 0; i < cur_op->num_inputs(); ++i) {
+        std::vector<size_t> input_indices = {0, 1};
+        for (auto i : input_indices) {


ok, then it seems the original code is designed for 2 elements

Jiexin-Zheng · 2025-01-10T03:26:54Z

make test
enable benchdnn_nightly
disable benchdnn_all
enable benchdnn_graph

TaoLv

please separate benchdnn inputs changes into a standalone commit.

TaoLv · 2025-01-10T06:22:02Z

src/graph/backend/dnnl/passes/transform.cpp

@@ -2266,7 +2266,8 @@ status_t binary_canonicalization(std::shared_ptr<subgraph_t> &sg) {
        int32_t src1_ndims = src1_lt.ndims;
        int32_t target_ndims = std::max(src0_ndims, src1_ndims);
        std::vector<int32_t> in_ndims {src0_ndims, src1_ndims};
-        for (size_t i = 0; i < cur_op->num_inputs(); ++i) {
+        std::vector<size_t> input_indices = {0, 1};
+        for (auto i : input_indices) {


This pass is applied before postop fusion pass, so input number is always 2 before. For this PR, although binary select has three inputs, since cond dims has been promised to be the same that of src0 by pass decompose_select_to_binary_ops, we only need to unsqueeze src0 and src1.

This explanation looks suspicious as the code has quite a few assumption to work properly. You may need to at least add comment for that.
BTW: I feel for (size_t i : {0, 1}) { .... } should work without defining input_indices.

src/graph/backend/dnnl/passes/utils.cpp

Jiexin-Zheng added the component:graph-api Codeowner: @oneapi-src/onednn-graph label Jan 7, 2025

Jiexin-Zheng requested review from ElaineBao, TaoLv and gyhintel January 7, 2025 14:48

Jiexin-Zheng self-assigned this Jan 7, 2025

Jiexin-Zheng requested a review from a team as a code owner January 7, 2025 14:48

github-actions bot added the component:tests Codeowner: @oneapi-src/onednn-arch label Jan 7, 2025

Jiexin-Zheng force-pushed the jiexin-zheng/main/select_op branch from 6be21c9 to 4ea4e67 Compare January 7, 2025 15:37

TaoLv reviewed Jan 7, 2025

View reviewed changes

rongzha1 reviewed Jan 8, 2025

View reviewed changes

src/graph/backend/dnnl/passes/lower.cpp Outdated Show resolved Hide resolved

Jiexin-Zheng force-pushed the jiexin-zheng/main/select_op branch from 4ea4e67 to b94bafa Compare January 8, 2025 12:52

Jiexin-Zheng requested a review from a team as a code owner January 8, 2025 12:52

Jiexin-Zheng changed the title ~~graph: backend: dnnl: support select pattern and op with binary primitive~~ graph: backend: dnnl: support select with binary primitive Jan 8, 2025

Jiexin-Zheng force-pushed the jiexin-zheng/main/select_op branch from b94bafa to 458e748 Compare January 8, 2025 14:28

dzarukin approved these changes Jan 8, 2025

View reviewed changes

src/graph/backend/dnnl/passes/transform.hpp Outdated Show resolved Hide resolved

gyhintel reviewed Jan 9, 2025

View reviewed changes

src/graph/backend/dnnl/dnnl_shape_infer.cpp Show resolved Hide resolved

src/graph/backend/dnnl/passes/transform.cpp Outdated Show resolved Hide resolved

TaoLv reviewed Jan 9, 2025

View reviewed changes

Jiexin-Zheng force-pushed the jiexin-zheng/main/select_op branch from 458e748 to f8262e0 Compare January 9, 2025 10:04

ElaineBao reviewed Jan 9, 2025

View reviewed changes

src/graph/backend/dnnl/dnnl_op_def.hpp Outdated Show resolved Hide resolved

src/graph/backend/dnnl/dnnl_shape_infer.cpp Show resolved Hide resolved

gyhintel approved these changes Jan 9, 2025

View reviewed changes

Jiexin-Zheng force-pushed the jiexin-zheng/main/select_op branch from f8262e0 to 325bca9 Compare January 9, 2025 15:58

ElaineBao approved these changes Jan 10, 2025

View reviewed changes

Jiexin-Zheng force-pushed the jiexin-zheng/main/select_op branch from 325bca9 to 6694b8c Compare January 10, 2025 03:26

TaoLv reviewed Jan 10, 2025

View reviewed changes

Jiexin-Zheng added 2 commits January 10, 2025 06:48

graph: backend,interface: add select binary impl

9b5bbe6

benchdnn: graph: add select broadcast cases

66e2b1f

Jiexin-Zheng force-pushed the jiexin-zheng/main/select_op branch from 6694b8c to 66e2b1f Compare January 10, 2025 06:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

graph: backend: dnnl: support select with binary primitive #2349

graph: backend: dnnl: support select with binary primitive #2349

Jiexin-Zheng commented Jan 7, 2025 •

edited

Loading

Jiexin-Zheng commented Jan 7, 2025

Jiexin-Zheng commented Jan 8, 2025

Jiexin-Zheng commented Jan 8, 2025

TaoLv left a comment

TaoLv Jan 9, 2025

Jiexin-Zheng Jan 9, 2025

ElaineBao Jan 10, 2025

Jiexin-Zheng Jan 10, 2025 •

edited

Loading

ElaineBao Jan 10, 2025

TaoLv Jan 10, 2025

Jiexin-Zheng Jan 10, 2025

Jiexin-Zheng commented Jan 9, 2025

Jiexin-Zheng commented Jan 9, 2025

Jiexin-Zheng commented Jan 9, 2025

ElaineBao Jan 10, 2025

Jiexin-Zheng commented Jan 10, 2025

TaoLv left a comment

TaoLv Jan 10, 2025

graph: backend: dnnl: support select with binary primitive #2349

Are you sure you want to change the base?

graph: backend: dnnl: support select with binary primitive #2349

Conversation

Jiexin-Zheng commented Jan 7, 2025 • edited Loading

Description

Performance

Jiexin-Zheng commented Jan 7, 2025

Jiexin-Zheng commented Jan 8, 2025

Jiexin-Zheng commented Jan 8, 2025

TaoLv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jiexin-Zheng Jan 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jiexin-Zheng commented Jan 9, 2025

Jiexin-Zheng commented Jan 9, 2025

Jiexin-Zheng commented Jan 9, 2025

Choose a reason for hiding this comment

Jiexin-Zheng commented Jan 10, 2025

TaoLv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jiexin-Zheng commented Jan 7, 2025 •

edited

Loading

Jiexin-Zheng Jan 10, 2025 •

edited

Loading